Root Mean Squared Logarithmic Error (RMSLE) — Regression Metric (From Scratch)#

RMSLE measures error in log space: it is the RMSE between \(\log(1 + y)\) and \(\log(1 + \hat y)\).

It is most useful when targets are non-negative, span orders of magnitude, and you care about multiplicative / percentage-like errors.

Goals

  • Build intuition with numeric examples + Plotly visuals

  • Write RMSLE/MSLE in clear notation (including domain constraints)

  • Implement root_mean_squared_log_error in NumPy (from scratch) and validate vs scikit-learn

  • Show how RMSLE naturally leads to optimizing a model on a log1p-transformed target

  • Summarize pros/cons, good use cases, and common pitfalls

Quick import#

from sklearn.metrics import root_mean_squared_log_error

Equivalent: np.sqrt(mean_squared_log_error(...)).

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

from sklearn.metrics import (
    mean_squared_error,
    mean_squared_log_error,
    root_mean_squared_error,
    root_mean_squared_log_error,
)

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

Prerequisites#

  • Regression setup: true targets \(y\) and predictions \(\hat y\)

  • Logarithms and the log1p / expm1 trick:

    • log1p(y) = log(1 + y) is stable when \(y\) is near 0

    • expm1(z) = exp(z) - 1 is the inverse of log1p

1) Definition and notation#

Given \(n\) samples with non-negative targets \(y_i \ge 0\) and predictions \(\hat y_i \ge 0\), define the log-transformed values:

\[t_i = \log(1 + y_i), \qquad \hat t_i = \log(1 + \hat y_i)\]

The mean squared logarithmic error (MSLE) is:

\[\mathrm{MSLE}(y, \hat y) = \frac{1}{n}\sum_{i=1}^n (\hat t_i - t_i)^2 = \frac{1}{n}\sum_{i=1}^n \left(\log(1 + \hat y_i) - \log(1 + y_i)\right)^2\]

The root mean squared logarithmic error (RMSLE) is:

\[\mathrm{RMSLE}(y, \hat y) = \sqrt{\mathrm{MSLE}(y, \hat y)}\]

Weighted variant with sample weights \(w_i \ge 0\):

\[\mathrm{MSLE}_w = \frac{\sum_{i=1}^n w_i (\hat t_i - t_i)^2}{\sum_{i=1}^n w_i}, \qquad \mathrm{RMSLE}_w = \sqrt{\mathrm{MSLE}_w}\]

Key identity (what makes this metric convenient):

\[\mathrm{RMSLE}(y, \hat y) = \mathrm{RMSE}(\log(1+y), \log(1+\hat y))\]

Notes:

  • log is the natural logarithm; using another base just scales the metric by a constant.

  • For multi-output regression, implementations typically compute RMSLE per output and then average.

2) Domain constraints and edge cases#

  • Non-negativity: Most definitions (and scikit-learn) require \(y \ge 0\) and \(\hat y \ge 0\).

  • Zeros are fine: log1p(0) = 0, which is why log(1 + y) is used instead of log(y).

  • Negative predictions: A linear model can output negative values; for RMSLE you often

    • use a model that enforces \(\hat y \ge 0\) (e.g., predict in log space), or

    • clip: \(\hat y \leftarrow \max(\hat y, 0)\) at evaluation time (common in practice).

  • Near zero, it behaves like squared error: for small \(y\), \(\log(1+y) \approx y\).

  • For large values, it behaves like squared relative error: for large \(y\), \(\log(1+y) \approx \log(y)\).

vals = np.array([0.0, 0.1, 1.0, 10.0, 100.0])
pd.DataFrame(
    {
        "y": vals,
        "log1p(y)": np.log1p(vals),
        "expm1(log1p(y))": np.expm1(np.log1p(vals)),
    }
)
y log1p(y) expm1(log1p(y))
0 0.0 0.000000 0.0
1 0.1 0.095310 0.1
2 1.0 0.693147 1.0
3 10.0 2.397895 10.0
4 100.0 4.615121 100.0

3) Intuition: RMSLE cares about ratios (mostly)#

For large targets, the +1 becomes negligible and:

\[\log(1 + \hat y) - \log(1 + y) \approx \log(\hat y) - \log(y) = \log\left(\frac{\hat y}{y}\right)\]

So for large \(y\), a prediction that is off by a factor of \(c\) (i.e., \(\hat y = c y\)) has error approximately:

\[\left(\log(c)\right)^2\]

This means:

  • Overpredicting by \(\times 2\) and underpredicting by \(\div 2\) have the same penalty (because \(\log(2)\) and \(\log(1/2) = -\log(2)\) square to the same value).

  • The metric is much less dominated by very large targets than RMSE/MSE.

For small targets, \(\log(1+y) \approx y\), so the metric behaves closer to squared error on the original scale.

ratios = np.logspace(-2, 2, 500)  # 0.01 .. 100
y_trues = [0.1, 1.0, 10.0, 100.0, 1000.0]

parts = []
for y in y_trues:
    y_pred = ratios * y
    parts.append(
        pd.DataFrame(
            {
                "ratio": ratios,
                "sq_log_error": (np.log1p(y_pred) - np.log1p(y)) ** 2,
                "series": f"y_true={y:g}",
            }
        )
    )

# large-y approximation: (log ratio)^2
parts.append(
    pd.DataFrame(
        {
            "ratio": ratios,
            "sq_log_error": (np.log(ratios)) ** 2,
            "series": "(log ratio)^2 (large-y approx)",
        }
    )
)

df_ratio = pd.concat(parts, ignore_index=True)

fig = px.line(
    df_ratio,
    x="ratio",
    y="sq_log_error",
    color="series",
    log_x=True,
    title="Per-sample squared log error vs multiplicative ratio",
    labels={
        "ratio": "ratio = y_pred / y_true",
        "sq_log_error": "(log1p(y_pred) - log1p(y_true))^2",
        "series": "curve",
    },
)
fig.add_vline(x=1.0, line_dash="dash", line_color="black")
fig.show()

4) A tiny worked example#

We’ll compute RMSLE step-by-step and compare against scikit-learn.

y_true = np.array([0.0, 1.0, 10.0, 100.0])
y_pred = np.array([0.0, 2.0, 8.0, 120.0])

t_true = np.log1p(y_true)
t_pred = np.log1p(y_pred)
diff = t_pred - t_true

msle = float(np.mean(diff**2))
rmsle = float(np.sqrt(msle))

print("t_true:", t_true)
print("t_pred:", t_pred)
print("diff:", diff)
print("MSLE:", msle)
print("RMSLE:", rmsle)

print("sklearn MSLE:", mean_squared_log_error(y_true, y_pred))
print("sklearn RMSLE:", root_mean_squared_log_error(y_true, y_pred))
t_true: [0.     0.6931 2.3979 4.6151]
t_pred: [0.     1.0986 2.1972 4.7958]
diff: [ 0.      0.4055 -0.2007  0.1807]
MSLE: 0.05932808530023383
RMSLE: 0.24357357266385413
sklearn MSLE: 0.05932808530023383
sklearn RMSLE: 0.24357357266385413
df_example = pd.DataFrame(
    {
        "i": np.arange(len(y_true)),
        "y_true": y_true,
        "y_pred": y_pred,
        "log1p(y_true)": t_true,
        "log1p(y_pred)": t_pred,
        "sq_log_error": diff**2,
    }
)

fig = px.bar(
    df_example,
    x="i",
    y="sq_log_error",
    hover_data=["y_true", "y_pred", "log1p(y_true)", "log1p(y_pred)"],
    title="Per-sample MSLE contribution (squared log error)",
    labels={"i": "sample index", "sq_log_error": "(log1p(y_pred) - log1p(y_true))^2"},
)
fig.show()

5) RMSLE vs RMSE: what changes when you take logs?#

Consider targets that span orders of magnitude.

  • With RMSE, a fixed relative error (say +20%) produces much larger absolute residuals for large targets, so large targets dominate the metric.

  • With RMSLE, a fixed relative error produces approximately the same log residual, so the contributions are more balanced.

y_true_scale = np.array([1.0, 10.0, 100.0, 1000.0])

# Scenario A: same relative error (20% over)
y_pred_rel = 1.2 * y_true_scale

# Scenario B: same absolute error (+10)
y_pred_abs = y_true_scale + 10.0

def sq_error(y_t, y_p):
    return (y_p - y_t) ** 2

def sq_log_error(y_t, y_p):
    return (np.log1p(y_p) - np.log1p(y_t)) ** 2

df_scale = pd.concat(
    [
        pd.DataFrame(
            {
                "scenario": "20% over",
                "y_true": y_true_scale,
                "y_pred": y_pred_rel,
                "squared error": sq_error(y_true_scale, y_pred_rel),
                "squared log error": sq_log_error(y_true_scale, y_pred_rel),
            }
        ),
        pd.DataFrame(
            {
                "scenario": "+10 absolute",
                "y_true": y_true_scale,
                "y_pred": y_pred_abs,
                "squared error": sq_error(y_true_scale, y_pred_abs),
                "squared log error": sq_log_error(y_true_scale, y_pred_abs),
            }
        ),
    ],
    ignore_index=True,
)

df_long = df_scale.melt(
    id_vars=["scenario", "y_true", "y_pred"],
    value_vars=["squared error", "squared log error"],
    var_name="term",
    value_name="contribution",
)

fig = px.bar(
    df_long,
    x="y_true",
    y="contribution",
    color="term",
    barmode="group",
    facet_col="scenario",
    log_y=True,
    title="Per-sample contributions: RMSE/MSE vs RMSLE/MSLE",
    labels={"y_true": "target (y_true)", "contribution": "contribution (log scale)"},
)
fig.show()

for name, yp in [("20% over", y_pred_rel), ("+10 absolute", y_pred_abs)]:
    rmse = root_mean_squared_error(y_true_scale, yp)
    rmsle = root_mean_squared_log_error(y_true_scale, yp)
    print(f"{name:>11} | RMSE={rmse:.4f} | RMSLE={rmsle:.4f}")
   20% over | RMSE=100.5038 | RMSLE=0.1603
+10 absolute | RMSE=10.0000 | RMSLE=0.9536

6) NumPy implementation (from scratch)#

We’ll implement MSLE and RMSLE with scikit-learn-like handling:

  • 1D and 2D targets ((n_samples,) or (n_samples, n_outputs))

  • Optional sample_weight

  • multioutput ∈ {"raw_values", "uniform_average"} or explicit output weights

def _as_2d(y):
    y = np.asarray(y, dtype=float)
    if y.ndim == 1:
        return y.reshape(-1, 1)
    if y.ndim == 2:
        return y
    raise ValueError("y must be 1D or 2D (n_samples,) or (n_samples, n_outputs).")


def _check_non_negative(y, *, name):
    if np.any(y < 0):
        raise ValueError(f"{name} contains negative values; RMSLE/MSLE require y >= 0.")


def mean_squared_log_error_np(y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"):
    """Mean squared logarithmic error (MSLE).

    MSLE(y, y_hat) = mean((log1p(y_hat) - log1p(y))^2)
    """
    y_true_2d = _as_2d(y_true)
    y_pred_2d = _as_2d(y_pred)

    if y_true_2d.shape != y_pred_2d.shape:
        raise ValueError(f"shape mismatch: y_true{y_true_2d.shape} vs y_pred{y_pred_2d.shape}")

    _check_non_negative(y_true_2d, name="y_true")
    _check_non_negative(y_pred_2d, name="y_pred")

    t_true = np.log1p(y_true_2d)
    t_pred = np.log1p(y_pred_2d)
    residual = t_pred - t_true

    if sample_weight is None:
        msle_per_output = np.mean(residual**2, axis=0)
    else:
        w = np.asarray(sample_weight, dtype=float)
        if w.ndim != 1:
            raise ValueError("sample_weight must be 1D of shape (n_samples,).")
        if w.shape[0] != y_true_2d.shape[0]:
            raise ValueError("sample_weight length must match n_samples.")
        w = w.reshape(-1, 1)
        msle_per_output = np.sum(w * residual**2, axis=0) / np.sum(w, axis=0)

    if multioutput == "raw_values":
        return msle_per_output
    if multioutput == "uniform_average":
        return float(np.mean(msle_per_output))

    weights = np.asarray(multioutput, dtype=float)
    if weights.shape != (msle_per_output.shape[0],):
        raise ValueError("multioutput weights must match n_outputs.")
    return float(np.average(msle_per_output, weights=weights))


def root_mean_squared_log_error_np(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
):
    """Root mean squared logarithmic error (RMSLE): sqrt(MSLE)."""
    msle_per_output = mean_squared_log_error_np(
        y_true,
        y_pred,
        sample_weight=sample_weight,
        multioutput="raw_values",
    )
    rmsle_per_output = np.sqrt(msle_per_output)

    if multioutput == "raw_values":
        return rmsle_per_output
    if multioutput == "uniform_average":
        return float(np.mean(rmsle_per_output))

    weights = np.asarray(multioutput, dtype=float)
    if weights.shape != (rmsle_per_output.shape[0],):
        raise ValueError("multioutput weights must match n_outputs.")
    return float(np.average(rmsle_per_output, weights=weights))
y_true_rand = rng.lognormal(mean=1.2, sigma=0.9, size=(60, 3))
y_pred_rand = y_true_rand * rng.lognormal(mean=0.0, sigma=0.3, size=y_true_rand.shape)

print("ours raw:", root_mean_squared_log_error_np(y_true_rand, y_pred_rand, multioutput="raw_values"))
print("sk   raw:", root_mean_squared_log_error(y_true_rand, y_pred_rand, multioutput="raw_values"))

sample_w = rng.uniform(0.5, 2.0, size=y_true_rand.shape[0])
print("ours weighted:", root_mean_squared_log_error_np(y_true_rand, y_pred_rand, sample_weight=sample_w))
print("sk   weighted:", root_mean_squared_log_error(y_true_rand, y_pred_rand, sample_weight=sample_w))

assert np.allclose(
    root_mean_squared_log_error_np(y_true_rand, y_pred_rand, multioutput="raw_values"),
    root_mean_squared_log_error(y_true_rand, y_pred_rand, multioutput="raw_values"),
)
assert np.isclose(
    root_mean_squared_log_error_np(y_true_rand, y_pred_rand, sample_weight=sample_w),
    root_mean_squared_log_error(y_true_rand, y_pred_rand, sample_weight=sample_w),
)

# Negative values should raise (to match sklearn)
try:
    root_mean_squared_log_error_np([0.0, 1.0], [0.0, -0.1])
except ValueError as e:
    print("caught:", e)
ours raw: [0.2174 0.2333 0.2297]
sk   raw: [0.2174 0.2333 0.2297]
ours weighted: 0.23046738688656432
sk   weighted: 0.23046738688656432
caught: y_pred contains negative values; RMSLE/MSLE require y >= 0.

7) RMSLE as an objective: gradients and optimization#

Because the square root is monotonic, minimizing RMSLE is equivalent to minimizing MSLE.

Let \(\Delta_i = \log(1+\hat y_i) - \log(1+y_i)\). Then:

\[\mathrm{MSLE} = \frac{1}{n}\sum_{i=1}^n \Delta_i^2\]

Derivative w.r.t. a prediction \(\hat y_i\) (for \(\hat y_i > -1\)):

\[\frac{\partial\,\mathrm{MSLE}}{\partial\hat y_i} = \frac{2}{n}\,\Delta_i\,\frac{1}{1+\hat y_i}\]

For RMSLE:

\[\frac{\partial\,\mathrm{RMSLE}}{\partial\hat y_i} = \frac{1}{n\,\mathrm{RMSLE}}\,\Delta_i\,\frac{1}{1+\hat y_i}\]

Practical takeaway:

  • There is an extra factor \(\frac{1}{1+\hat y_i}\), so gradients are larger for small predictions.

  • A very common training trick is to optimize in log space: fit a model to \(t = \log(1+y)\) using standard squared error, then transform back with expm1.

# Synthetic data with multiplicative noise (log-normal in y)
n = 400
x = rng.uniform(0.0, 6.0, size=n)

# True relationship in log1p-space
t = 1.5 + 1.0 * x + rng.normal(0.0, 0.35, size=n)  # t = log1p(y)
y = np.expm1(t)

# Train/test split
perm = rng.permutation(n)
cut = int(0.8 * n)
tr, te = perm[:cut], perm[cut:]

x_tr, y_tr = x[tr], y[tr]
x_te, y_te = x[te], y[te]
fig = px.scatter(
    x=x_tr,
    y=y_tr,
    opacity=0.7,
    title="Synthetic regression data (y spans a wide range)",
    labels={"x": "feature x", "y": "target y"},
)
fig.update_yaxes(type="log")
fig.show()
def predict_linear(x, w, b):
    x = np.asarray(x, dtype=float)
    return w * x + b


def fit_linear_mse_gd(x, y, *, lr=5e-4, steps=600):
    """Fit y ≈ w x + b by minimizing MSE on y (gradient descent)."""
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)

    w = 0.0
    b = 0.0
    n = x.shape[0]

    hist = {"mse": [], "rmsle": [], "w": [], "b": []}

    for _ in range(steps):
        y_hat = predict_linear(x, w, b)
        r = y_hat - y

        mse = float(np.mean(r**2))

        # RMSLE isn't defined for negative predictions in sklearn; clip for evaluation.
        y_hat_clip = np.maximum(y_hat, 0.0)
        rmsle = float(root_mean_squared_log_error_np(y, y_hat_clip))

        grad_w = (2.0 / n) * float(np.dot(r, x))
        grad_b = (2.0 / n) * float(np.sum(r))

        w -= lr * grad_w
        b -= lr * grad_b

        hist["mse"].append(mse)
        hist["rmsle"].append(rmsle)
        hist["w"].append(w)
        hist["b"].append(b)

    return w, b, hist


def fit_log1p_mse_gd(x, y, *, lr=0.05, steps=600):
    """Fit log1p(y) ≈ w x + b (equivalent to optimizing MSLE/RMSLE)."""
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    t = np.log1p(y)

    w = 0.0
    b = 0.0
    n = x.shape[0]

    hist = {"mse_log": [], "mse_y": [], "rmsle": [], "w": [], "b": []}

    for _ in range(steps):
        t_hat = predict_linear(x, w, b)  # model predicts log1p(y)
        r = t_hat - t

        mse_log = float(np.mean(r**2))

        y_hat = np.expm1(t_hat)
        y_hat = np.maximum(y_hat, 0.0)

        mse_y = float(np.mean((y_hat - y) ** 2))
        rmsle = float(root_mean_squared_log_error_np(y, y_hat))

        grad_w = (2.0 / n) * float(np.dot(r, x))
        grad_b = (2.0 / n) * float(np.sum(r))

        w -= lr * grad_w
        b -= lr * grad_b

        hist["mse_log"].append(mse_log)
        hist["mse_y"].append(mse_y)
        hist["rmsle"].append(rmsle)
        hist["w"].append(w)
        hist["b"].append(b)

    return w, b, hist
w_y, b_y, hist_y = fit_linear_mse_gd(x_tr, y_tr)
w_t, b_t, hist_t = fit_log1p_mse_gd(x_tr, y_tr)

y_hat_te_mse = np.maximum(predict_linear(x_te, w_y, b_y), 0.0)
y_hat_te_log = np.maximum(np.expm1(predict_linear(x_te, w_t, b_t)), 0.0)

print("Test RMSLE (fit MSE on y):     ", root_mean_squared_log_error_np(y_te, y_hat_te_mse))
print("Test RMSLE (fit on log1p(y)):", root_mean_squared_log_error_np(y_te, y_hat_te_log))

print("Test RMSE  (fit MSE on y):     ", root_mean_squared_error(y_te, y_hat_te_mse))
print("Test RMSE  (fit on log1p(y)):", root_mean_squared_error(y_te, y_hat_te_log))
Test RMSLE (fit MSE on y):      1.4927333701234642
Test RMSLE (fit on log1p(y)): 0.34749067593865945
Test RMSE  (fit MSE on y):      358.24267736906455
Test RMSE  (fit on log1p(y)): 155.56476300415895
df_hist = pd.DataFrame(
    {
        "step": np.arange(len(hist_y["rmsle"])),
        "RMSLE (fit MSE on y)": hist_y["rmsle"],
        "RMSLE (fit on log1p(y))": hist_t["rmsle"],
    }
)
df_hist_long = df_hist.melt(id_vars="step", var_name="model", value_name="rmsle")

fig = px.line(
    df_hist_long,
    x="step",
    y="rmsle",
    color="model",
    title="Training curves (RMSLE evaluated on the train set)",
    labels={"rmsle": "RMSLE"},
)
fig.show()
df_pred = pd.DataFrame(
    {
        "y_true": np.concatenate([y_te, y_te]),
        "y_pred": np.concatenate([y_hat_te_mse, y_hat_te_log]),
        "model": np.repeat(
            ["fit MSE on y (linear)", "fit on log1p(y)"],
            repeats=len(y_te),
        ),
    }
)

eps = 1e-6
min_v = float(np.minimum(df_pred["y_true"].min(), df_pred["y_pred"].min()))
max_v = float(np.maximum(df_pred["y_true"].max(), df_pred["y_pred"].max()))
min_v = max(min_v, eps)

fig = px.scatter(
    df_pred,
    x="y_true",
    y="y_pred",
    color="model",
    opacity=0.7,
    title="Test predictions: y_true vs y_pred",
    labels={"y_true": "true y", "y_pred": "predicted y"},
)
fig.add_trace(
    go.Scatter(
        x=[min_v, max_v],
        y=[min_v, max_v],
        mode="lines",
        name="y = x",
        line=dict(color="black", dash="dash"),
    )
)
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()

8) Practical usage notes (scikit-learn)#

  • If you want to optimize for RMSLE, a common baseline is:

    1. transform targets with log1p

    2. fit a standard regression model

    3. invert predictions with expm1

  • To avoid invalid values, clip predictions to \(\hat y \ge 0\) before computing RMSLE.

Scikit-learn provides TransformedTargetRegressor to make the log/exp transform explicit.

from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import LinearRegression

X_tr = x_tr.reshape(-1, 1)
X_te = x_te.reshape(-1, 1)

model = TransformedTargetRegressor(
    regressor=LinearRegression(),
    func=np.log1p,
    inverse_func=np.expm1,
)
model.fit(X_tr, y_tr)

y_pred_te = model.predict(X_te)
y_pred_te = np.clip(y_pred_te, 0.0, None)

print("sklearn RMSLE:", root_mean_squared_log_error(y_te, y_pred_te))
sklearn RMSLE: 0.3474906789145831

9) Pros, cons, and when to use RMSLE#

Pros

  • Focuses on multiplicative errors: being off by a factor matters more than being off by a constant

  • Handles targets spanning orders of magnitude (less dominated by large absolute values)

  • Natural when noise is approximately log-normal / heteroscedastic (variance grows with the mean)

  • Easy to optimize by modeling \(\log(1+y)\) and using squared error there

Cons

  • Requires non-negative targets and predictions (not suitable when \(y\) can be negative)

  • Can overweight small targets: mistakes near zero matter a lot

  • Reported value is in log units (less directly interpretable than RMSE/MAE)

  • If you train in log space and then invert with expm1, predictions correspond more to a median than a mean in the original space (bias can appear)

Good default when

  • Targets are counts/prices/sales/traffic/demand and you care about relative error

  • Targets have a heavy right tail and you want evaluation that doesn’t get dominated by the largest cases

10) Common pitfalls and diagnostics#

  • Invalid negatives: RMSLE is not defined for negative values in most libraries; enforce \(\hat y \ge 0\) (model choice or clipping).

  • Zero-heavy targets: inspect performance separately on \(y=0\) vs \(y>0\); RMSLE can behave differently near zero.

  • Compare metrics: always compare RMSLE with RMSE/MAE; choose based on the cost of absolute vs relative errors.

  • Inspect residuals in log space: if you optimize for RMSLE, plot \(\log(1+\hat y) - \log(1+y)\), not only \(\hat y - y\).

  • Remember the +1: the “relative error” intuition is best when targets are not tiny.

Exercises#

  1. Add support for sample_weight and explicit multioutput weights to the plotting examples (do some outputs matter more?).

  2. Create a dataset where the true noise is additive (not multiplicative) and compare RMSE vs RMSLE behavior.

  3. Show that for large \(y\), MSLE is approximately the squared log-ratio: \((\log(\hat y/y))^2\).

References#

  • scikit-learn metrics API: https://scikit-learn.org/stable/api/sklearn.metrics.html

  • Kaggle discussions on RMSLE (common for count/price targets)